01. What are Policy Gradient Methods?

What are Policy Gradient Methods?

Policy gradient methods are a subclass of policy-based methods. Watch the video below to learn more!

M3L3 C01 V3

In the Introduction to Policy-Based Methods lesson, you learned about many policy-based methods that could approximate either a deterministic or stochastic policy.

In this lesson, we'll confine our attention to stochastic policies.

## Quiz

Which of the following is a valid approach, if we'd like to use a neural network to approximate an agent's stochastic policy (for a discrete action space)? (Select all that apply.)

SOLUTION:
  • Use a softmax activation function in the output layer. This will ensure the network outputs probabilities. For each state input, sample an action from the output probability distribution.

Which of the following is true about the difference between policy-based and policy gradient methods? (Select all that apply.)

SOLUTION:
  • Policy gradient methods are a subclass of policy-based methods.
  • Not all policy-based methods are policy gradient methods.
  • Both policy-based methods and policy gradient methods directly try to optimize for the optimal policy, without maintaining value function estimates.